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14.  ABSTRACT 

This  paper  discusses  various  controversies  surrounding  the  meaning  and  use  of  such  conditionals  as  “A  given  B”  or  “If  B  then  A*’  including  that  .such  Boolean 
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probahilistically  non-monotonic,  (4)  can  be  combined  with  operations  that  extend  the  standard  Boolean  operations,  and  (5)  allow  definitions  tliat  extend  Boolean 
deduction  but  do  not  serve  as  deductions  themselves  thereby  avoiding  the  so-called  paradoxes  identified  hy  E.  Adam.s.  A  new  tlicory  of  deduction  with  uncertain 
conditionals  is  defined  in  terms  of  the  new  operations  on  conditionals  by  cxlending  the  familiar  equations  that  define  deduction  between  Boolean  propositions.  This 
leads  to  several  plausible  forms  of  deduction  between  conditionals.  Tliese  difTcrent  deductive  relations  on  conditionals  give  ri.se  to  diflerent  sets  of  implications. 
Methods  to  determine  the  implications  of  one  or  more  conditionals  with  respect  to  tlie  various  different  deductive  relations  are  described.  Three  examples  of 
deduction  with  uncertain  conditionals  are  extensively  examined  and  solved.  An  example  about  an  absent-minded  coffee  drinker  contains  two  so-called  subjunctive 
or  counter-factual  conditional.s,  which  pose  no  additional  difficulty.  The  issue  of  practical  computation  with  conditionals  is  addres.sed  and  the  use  of  information 
entropy  to  cut  tlirough  complexity  is  discussed  and  illustrated.  Lastly  there  is  tlie  question  of  how  much  confidence  can  be  attached  to  a  probability  distribution 
having  maximum  entropy.  In  this  regard,  die  results  of  E.  Jaynes  concerning  the  concentration  of  distributions  at  maximum  entropy  arc  described  along  with  two 
other  theoretical  approaches  to  tin's  problem. 
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Reflections  on  Logic  &  Probability 
IN  THE  Context  of  Conditionals 

Philip  G.  Calabrese 
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Abstract.  This  paper  discusses  various  controversies  surrounding  the  meaning  and  use  of 
such  conditionals  as  “A  given  B”  or  “If  B  then  A”  including  that  such  Boolean  fractions  1)  can 
non-trivially  carry  the  standard  conditional  probability,  2)  are  truth  functional  but  with  three 
rather  than  two  truth  values,  3)  are  logically  and  probabilistically  non-monotonic,  4)  can  be 
combined  with  operations  that  extend  the  standard  Boolean  operations,  and  5)  allow  defini¬ 
tions  that  extend  Boolean  deduction  but  do  not  serve  as  deductions  themselves  thereby  avoid¬ 
ing  the  so-called  paradoxes  identified  by  E.  Adams.  A  new  theory  of  deduction  with  uncertain 
conditionals  is  defined  in  terms  of  the  new  operations  on  conditionals  by  extending  the  familiar 
equations  that  define  deduction  between  Boolean  propositions.  This  leads  to  several  plausible 
forms  of  deduction  between  conditionals.  These  different  deductive  relations  on  conditionals 
give  rise  to  different  sets  of  implications.  Methods  to  determine  the  implications  of  one  or 
more  conditionals  with  respect  to  the  various  different  deductive  relations  are  described.  Three 
examples  of  deduction  with  uncertain  conditionals  are  extensively  examined  and  solved.  An 
example  about  an  absent-minded  coffee  drinker  contains  two  so-called  subjunctive  or  counter- 
factual  conditionals,  which  pose  no  additional  difficulty.  The  issue  of  practical  computation 
with  conditionals  is  addressed  and  the  use  of  information  entropy  to  cut  through  complexity  is 
discussed  and  illustrated.  Lastly  there  is  the  question  of  how  much  confidence  can  be  attached 
to  a  probability  distribution  having  maximum  entropy.  In  this  regard  the  results  of  E.  Jaynes 
concerning  the  concentration  of  distributions  at  maximum  entropy  are  described  along  with 
two  other  theoretical  approaches  to  this  problem. 

1.  Introduction.  Thirty-five  years  ago  the  theories  of  logic  and  of  probability  were  conspicu¬ 
ously  unfinished,  missing  a  division  operation  to  represent  conditional  statements.  Even  today 
many  people  still  reduce  all  conditional  statements  such  as  “if  B  then  A”  to  the  unconditioned 
(or  universally  conditioned)  statement  “A  or  not  B”,  the  so-called  “material  conditional”,  even 
though  it  has  long  been  recognized  that  the  material  conditional  is  of  no  use  in  estimating  the 
probability  of  A  in  the  context  of  the  truth  of  B.  The  latter  probability  is  the  well-known  con¬ 
ditional  probability  of  A  given  B,  the  ratio  of  the  probability  of  both  A  and  B  to  the  probability 
of  B.  The  conditional  probability  is  never  greater  than,  and  is  generally  much  less  than,  the 
probability  that  A  is  true  or  B  is  false.  Only  when  B  is  certain  or  when  A  is  certain  given  the 
truth  of  B  do  the  two  expressions  yield  essentially  the  same  result.  Even  when  B  is  false,  they 
differ  since  the  ratio  is  undefined  while  the  material  conditional  has  probability  1 .  This  has  all 
been  quantified  for  instance  in  [Cal87].  Yet  for  purposes  of  doing  2-valued  logic,  the  material 
conditional  works  just  fine.  Mathematicians  have  long  proved  their  theorems  of  the  form  “if  B 
then  A”  by  proving  that  in  all  cases  either  A  is  true  or  B  is  false. 
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However  when  B  is  uncertain  or  when  A  is  uncertain  given  the  truth  of  B,  the  material  condi¬ 
tional  is  not  an  appropriate  Boolean  proposition  to  represent  the  conditional  statement  “if  B 
then  A”.  Nor  is  there  any  other  Boolean  proposition  that  can  serve  the  purpose  of  both  logic 
and  probability  as  early  shown  by  D.  Lewis  [Lew75].  This  non-existence  is  reminiscent  of 
results  throughout  the  history  of  mathematics  that  preceded  the  invention  of  new  numbers 
needed  to  satisfy  some  relationships  that  naturally  arose.  The  irrational  numbers  were  needed 
to  represent  the  length  of  the  hypotenuse  of  a  square  in  terms  of  the  length  of  a  side  of  that 

square;  complex  numbers  were  invented  to  solve  polynomial  equations  such  as  x  +  1  =  0  and 
integer  ftactions  were  invented  to  have  numbers  that  could  solve  equations  like  3  x  =  20.  In 
each  case,  mathematicians  didn’t  stop  with  the  declaration  that  there  were  no  such  numbers  in 
the  existing  system;  they  instead  invented  new  numbers  that  included  the  old  ones  but  also 
solved  the  desired  equations.  The  same  thing  has  worked  in  the  case  of  events  and  proposi¬ 
tions  [Cal87,  Cal94]  and  the  result  is  no  less  profound.  The  more  surprising  thing  is  that  it  has 
taken  so  long  for  the  development  to  occur  in  the  case  of  events  and  propositions.  Apart  from 
Boole  himself,  such  a  system  of  ordered  pairs  was  envisioned  by  a  few  researchers  including 
G.  Schay  [Sch68]  and  Z.  Domotor  Dom69],  but  these  developments  didn’t  go  far  enough  in  the 
right  direction  before  getting  bogged  down.  It  is  now  clear,  however,  that  a  system  of  ordered 
pairs  of  probabilistic  events  or  of  logical  propositions  can  be  defined  to  represent  conditional 
statements,  avoid  the  triviality  results  of  Lewis  [Lew75],  and  be  assigned  the  standard  condi¬ 
tional  probability. 

These  operations,  on  the  ordered  pairs  of  events  or  propositions  (A|B),  “A  given  B”,  have  been 
extensively  analyzed  and  motivated  in  [Cal87,  Cal94,  Cal02].  Using  ‘  to  denote  “not”  and  jux¬ 
taposition  to  denote  “and”  these  operations  on  conditionals  are; 

a)  (A|B)’  -  (A  I  B). 

That  is,  “not  (A  given  B)”  is  equivalent  to  “(not  A)  given  B”. 

b)  (A|B)  or  (C|D)  =  ((AB  or  CD)  |  (B  or  D)) 

The  right  hand  side  is  “given  either  conditional  is  applicable,  at  least  one  is  true”. 

c)  (A|B)  and  (C|D)  =  [ABD'  or  ABCD  or  B'CD]  |  (B  or  D) 

The  right  hand  side  is  “given  either  conditional  is  applicable,  at  least  one  is  true  while  the  other 
is  not  false”.  It  can  be  rewritten  as  [AB(CD  or  D’)  or  CD(AB  or  B’)  |  B  or  D)]. 

d)  (A|B)|(C|D)  =  (A  I  (B)(C|D)) 

The  right  hand  side  is  “given  B  and  (C|D)  are  not  false,  A  is  true.” 

By  writing  B  as  a  conditional  (B  |  fi)  with  the  universe  fi  as  condition  the  conjunction 
(B)(C|D)  in  d)  reduces  to  B(C  V  D')  using  operation  c). 

This  system  of  “Boolean  fractions”  (oB|oB)  includes  the  original  events  or  propositions  oB  as  a 
subsystem  and  also  satisfies  the  essential  needs  of  both  logic  and  conditional  probability.  Two 
conditionals  (A|B)  and  (C|D)  are  equivalent  (=)  if  and  only  if  B=D  and  AB  =  CD.  As  with  the 
past  extensions  of  existing  number  systems,  some  properties  no  longer  hold  in  the  new  system. 
For  instance,  the  new  system  is  not  wholly  distributive  as  are  Boolean  propositions. 

As  with  any  new  system  of  numbers  there  has  been  quite  a  lot  of  resistance  to  this  new  algebra 
of  conditionals.  Some  researchers  (see  [Goo91  A,  Hai96]),  recognizing  the  virtue  of  a  system 
of  ordered  pairs  of  events  to  represent  conditional  events,  have  nevertheless  disputed  the 
choice  of  extended  operations  on  those  ordered  pairs.  However,  the  operations  for  “or”  and 
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“and”  in  [Cal87]  were  independent  rediscoveries  of  the  two  so-called  “quasi”  operations  for 
“or”  and  “and”  early  employed  by  E.  Adams  [Ada66,  Ada86],  a  pioneer  researcher  of  condi¬ 
tionals  writing  in  the  philosophical  literature.  Adams  calls  these  operations  “quasi”  merely 
because  they  are  not  “monotonic”.  That  is,  combining  two  conditionals  with  “and”  does  not 
always  result  in  a  new  conditional  that  implies  each  of  the  component  conditionals.  Nor  does 
combining  two  conditionals  with  “or”  always  result  in  a  conditional  that  is  implied  by  each  of 
the  component  conditionals.  This  seems  rather  coimter  intuitive  when  considered  in  the 
abstract  because  we  are  all  so  imbued  with  equal-condition  thinking.  But  when  two  condition¬ 
als  with  different  conditions  are  combined  as  in  operations  b)  or  c),  the  result  is  a  conditional 
whose  condition  is  the  disjunction  (“or”)  of  the  two  original  conditions.  By  expanding  the  con¬ 
text  in  this  way  probabilities  have  more  freedom  to  change  up  or  down.  Deduction  is  also 
much  more  complicated  when  dealing  with  conditionals  with  different  conditions,  but  now  a 
successful  extension  of  Boolean  deduction  for  uncertain  conditionals  has  been  developed 
[Cal90,  Cal91,Cal02]. 

Another  issue  that  arises  with  conditionals  is  their  truth  functionality.  Are  conditionals  “true” 
or  “false”  like  ordinary  propositions  or  events?  Even  the  ancient  Greeks  were  troubled  by  this 
question.  For  some  reason  Adams  seems  to  take  the  attitude  [Ada98,  p.65,  foomote]  that 
“inapplicable”  is  not  really  a  3rd  truth- value  that  can  be  assigned  to  a  conditional.  On  the  other 
hand,  B.  De  Finetti  [DeF36]  early  asserted  that  a  conditional  has  three,  rather  than  two,  truth- 
values:  If  the  condition  B  is  true,  then  “A  given  B”  is  true  or  false  depending  on  the  truth  of  A. 
But  when  B  is  false,  De  Finetti  asserted  that  the  conditional  was  neither  true  nor  false,  but 
instead  required  a  third  truth-value,  which  he  unfortunately  identified  with  “unknown”  and 
therefore  assigned  a  numerical  value  somewhere  between  0  and  1 .  But  a  conditional  with  a 
false  condition  is  not  ’’unknown”;  it  is  “inapplicable”.  For  instance,  if  I  am  asked,  “if  you  had 
military  service,  in  which  branch  did  you  serve?”  I  don’t  answer  “unknown”.  I  answer  “inap¬ 
plicable”  because  I  haven’t  had  military  service.  The  question  and 'its  answer  are  not  assigned 
a  truth- value  between  0  and  1;  they  are  essentially  ignored.  The  answer  “unknown”  would  be 
appropriate  by  someone  who  thought  I  had  military  service  but  did  not  know  in  which  branch  I 
served. 

While  it  is  not  immediately  obvious,  the  question  of  what  operations  are  used  to  combine  con¬ 
ditional  propositions  is  essentially  equivalent  to  the  question  of  which  of  the  three  truth- values 
should  be  assigned  to  the  nine  combinations  of  the  truth  (T),  falsity  (F)  or  inapplicability  (I)  for 
two  different  conditionals.  See  [Cal93,  p.7]  for  a  proof.  This  approach  was  taken  by  A.  Walker 
[Wal94]  to  determine  those  few  operations  on  conditionals  that  satisfy  natural  requirements 
such  as  being  commutative  and  idempotent.  This  approach  was  also  employed  in  [Cal02]  to 
provide  careful  motivations  and  a  complete  characterization  of  the  4  operations  on  conditionals 
a)  -  d)  listed  above  and  originally  grouped  together  in  [Cal87].  Three  of  these  operations  in 

the  form  of  3-valued  truth  tables  were  identified  by  B.  Sobocinski  [Sob52,  Res69],  but  his  4**’ 
operation  was  very  different  from  the  operation  d)  in  [Cal87].  Similarly,  Adams  easily  identi¬ 
fied  the  negation  operation  for  conditionals,  but  passed  over  the  4*  iterated  conditioning  oper¬ 
ation  employed  here  because  he  interprets  a  conditional  as  an  implication  instead  of  as  a  new 
object  -  an  event  or  proposition  in  a  given  context. 

Recently,  Adams  reconsidered  the  issue  of  “embedded”  or  iterated  conditionals  [Ada98,  p.268] 
and  the  so-called  “import-export”  principle  which  asserts  that  ((A  |  B)  |  C)  =  (A  |  B  and  C)  for 
any  expressions  A,  B  and  C.  Operation  d)  is  a  restricted  form  of  this  principle,  which  can  be 
used  to  reduce  any  iterated  conditional  to  a  simple  conditional  with  Boolean  components.  For 
propositions  A,  B,  C,  D,  E,  and  F,  a  more  general  form  of  the  import-export  law  follows  from 
operations  a)  -  d): 


I 
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[(A|B)  I  (C|D)] !  (E|F)  -  (A|B)  |  [(C|D)  (E|F)1  (1.1) 

Using  “import-export”  Adams  sites  the  following  example  as  a  counter  example  of  the  basic 
logical  principle  of  modus  ponens  that  A  is  always  a  logical  consequence  of  B  and  (A|B).  Not¬ 
ing  that  by  import-export,  ( (HD  |  H)  |  D  )  =  (HD  |  HD),  and  that  the  latter  is  a  logical  neces¬ 
sity,  Adams  gives  the  example 

D  and  ( (HD  I  H)  I  D )  implies  (HD  |  H),  (1.2) 

Which,  according  to  Adams,  should  be  valid  by  modus  ponens.  For  instance,  interpreting  D  as 
“it  is  a  dog”  and  H  as  “it  is  heavy  (500  pounds)”  modus  ponens  seems  to  fail  because  the  impli¬ 
cation  (HD  I  KT),  that  “it  is  a  heavy  dog  given  that  it  is  heavy”  should  not  logically  follow  from 
D  and  “(HD  [  H)  given  D”.  Adams  mentions  three  authors  who  each  take  a  different  direction 
here,  one  accepting  “import-export”,  one  accepting  modus  ponens,  and  one  accepting  both 
with  reservations  about  modus  ponens. 

But  the  difficulties  raised  by  this  example  disappear  when  it  is  remembered  that  with  modus 
ponens,  it  is  not  just  “A”  that  is  a  logical  consequence  of  “B  and  (A|B)”,  but  rather  “A  and  B” 
that  is  the  logical  consequence.  And  since  conditionals  are  not  logically  monotonic,  “A  and  B” 
does  not  necessarily  imply  “A”  alone,  as  Adams  has  elsewhere  shown.  For  conditionals,  “A 
and  B”  may  no  longer  imply  “A”  and  may  also  have  larger  probability  than  “A”  alone. 

Therefore,  the  logical  implication  of  the  left  side  of  equation  1 .2  is  “D  and  (HD  |  H)”,  which  by 
operation  c)  reduces  to  just  D,  and  D  is  certainly  a  valid  implication  of  the  left  side  of  1 .2.  So 
the  “paradox”  arises  because  the  notion  that  “B  and  A”  must  logically  imply  B  is  false  for  con¬ 
ditionals. 

For  example,  consider  a  single  roll  of  a  fair  die  with  faces  numbered  1  through  6.  The  condi¬ 
tional  (2  I  even)  representing  “2  comes  up  given  an  even  number  comes  up”  has  conditional 
probability  1/3,  and  it  surely  logically  implies  itself  by  any  intuitive  concept  of  implication. 
Now  conjoin  the  conditional  (1  or  3  |  <  5),  representing  “1  or  3  comes  up  given  the  roll  is  less 
than  5”,  with  (2  |  even)  and  the  result  by  operation  c)  is  (1  or  3  |  not  5),  which  obviously  does 
not  logically  imply  (2  |  even)  by  any  intuitive  concept  of  logical  implication.  Note  also  that  (1 
or  3  I  not  5)  has  conditional  probability  2/5,  which  is  larger  than  1/3,  the  conditional  probability 
of  (2  I  even).  All  of  these  situations  have  been  analyzed  in  [Cal02].  Adams  gives  a  similar 
example  [Ada98,  p.  273]  that  can  be  handled  in  the  same  way. 

Concerning  embedded  conditionals,  Adams  claims  [Ada98,  p.  274]  that,  “So  far  no  one  has 
come  up  with  a  pragmatics  that  corresponds  to  the  truth-conditional  or  probabilistic  semantics 
of  the  theories  that  they  prose  . . .”.  However  Adams  has  too  quickly  passed  over  the  4-opera- 
tion  system  of  Boolean  fractions  (conditionals  events)  recounted  here,  and  he  has  not  yet 
examined  the  additional  theory  of  deduction  defined  in  terms  of  the  operations  on  those  condi¬ 
tionals. 

To  repeat,  most  if  not  all  of  these  so-called  paradoxes  of  embedded  conditionals  and  logical 
deduction  arise  from  the  unwarranted  identification  of  the  conditional  (A|B)  with  the  logical 
implication  of  A  by  B.  Others  arise  by  forgetting  that  conditionals  are  logically  non-mono- 
tonic.  However,  when  (A|B)  is  taken  as  a  new  object  and  deduction  is  defined  in  terms  of  the 
operations  a)  -  d),  these  paradoxes  disappear.  Just  as  it  is  in  general  impossible  to  force  Bool¬ 
ean  propositions  to  carry  the  conditional  probability,  so  too  is  it  impossible  to  force  condition¬ 
als  to  serve  as  implication  relations.  The  latter  must  be  separately  defined  in  terms  of,  or  at 
least  consistent  with,  the  chosen  operations  on  conditionals. 

In  Section  2.1  and  2.2  the  essentials  of  the  theory  of  deduction  with  uncertain  conditionals  are 
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recounted  including  some  refinements  such  as  Definition  2.2.4  of  the  “conjunction  propert>’”. 
Section  2.3  provides  three  new  illustrative  examples  of  deduction  with  uncertain  conditionals. 
Section  2.3.1  addresses  the  familiar  question  of  what  can  be  deduced  by  transitivity  with  con¬ 
ditionals.  That  is,  what  can  be  deduced  from  “A  given  B”  and  “B  given  C”?  Section  2.3.2  ana¬ 
lyzes  a  set  of  three  rather  convoluted  conditionals  concerning  an  absent-minded  coffee  drinker. 
Two  of  the  three  conditionals  are  so-called  non-indicative,  also  called  subjunctive  or  counter- 
factual  conditionals.  Such  conditionals  seem  to  pose  no  additional  difficulty  for  this  theory  of 
deduction.  In  Section  2.3.3  the  absent-minded  coffee  drinker  example  is  modified  to  make  it  a 
valid  deduction  in  the  two-valued  Boolean  logic.  The  implications  with  respect  to  various 
deductive  relations  are  again  determined.  Section  3  addresses  the  issue  of  practical  computa¬ 
tion  of  combinations  of  conditionals  and  deductions  with  conditionals.  Section  3.1  illustrates 
the  difficulties  and  complexities  of  pure  Bayesian  analysis  when  applied  to  the  “transitivity 
example”  of  Section  2.3.1.  Section  3.2  discusses  the  use  of  entropy  in  information  processing 
as  a  reasonable  and  principled  way  to  cut  through  complexity  and  solve  for  unknown  probabil¬ 
ities  and  conditional  probabilities.  This  idea  has  already  been  successfully  implemented  in  the 
computer  program  SPIRIT  developed  at  Hagen  University  by  a  team  headed  by  W.  Rodder. 
Section  3.3  addresses  the  question  of  the  confidence  that  can  be  attached  to  probabilities  deter¬ 
mined  by  the  maximum  entropy  solution.  In  this  regard  the  separate  work  of  E.T.  Jaynes^  S. 
Amari,  and  A.  Caticha  are  described,  especially  that  of  Jaynes,  who  proves  an  entropy  concen¬ 
tration  theorem  that  provides  a  statistical  measure  of  the  fraction  of  eligible  probability  distri¬ 
butions  whose  entropy  falls  below  a  specified  critical  value. 

2.  Deduction  with  Uncertain  Conditionals.  Deduction  for  uncertain  conditionals  must  be 
defined  in  terms  of  the  operations  a)  -  d)  on  conditionals  listed  in  the  introduction.  For 
instance,  if  (A|B)  and  (C|D)  are  two  conditionals,  we  may  wish  to  define  deduction  of  (C|D)  by 
(A|B)  to  mean  that  the  conjunction  (A|B)  (C|D)  of  the  two  conditionals  should  be  equivalent  to 
(A|B)  as  is  the  case  with  Boolean  propositions.  Recall  that  for  Boolean  propositions  p  implies 
q  can  be  defined  with  the  conjunction  operation  by  the  equation  “p  and  q  =  p”.  Alternately,  we 
could  use  the  disjunction  operation  and  define  this  same  implication  as  “p  or  q  =  q”.  Still  other 
ways  exist  such  as  “q  or  not  p  =  1  (true)”.  Surprisingly,  in  the  realm  of  conditionals  none  of 
these  definitions  of  implication  are  equivalent  to  one  another!  This  has  all  been  extensively 
developed  in  [Cal90,  Cal91,  Cal94]  and  especially  [Cal02].  This  development  will  be  summa¬ 
rized  and  streamlined  in  sections  2.1  and  2.2. 

2.1  Deductive  Relations.  The  expression  “B  <  A”  is  used  to  signify  “B  implies  A”  because 
for  Boolean  propositions  this  implication  is  equivalent  to  saying  that  “the  instances  of  B  are  a 
subset  of  the  instances  of  A”.  This  is  also  the  appropriate  interpretation  in  case  that  A  and  B 
are  probabilistic  events.  Some  readers  may  wish  to  mentally  substitute  the  entailment  arrow  => 
for  <  to  connote  deduction. 

Definition  2.1.1.  An  implication  or  deductive  relation,  <,  on  conditionals  is  a  reflexive  and 
transitive  relation  on  the  set  of  conditionals. 

For  instance,  one  such  deductive  relation  is 

(A(B)  <bo  (C|D)  if  and  only  if  B  =  DandAB_CD  (2.1.1) 

That  is,  conditional  (A|B)  implies  conditional  (C|D)  with  respect  to  this  deductive  relation  if 
and  only  if  the  conditions  B  and  D  are  equivalent  propositions  or  events,  and  within  this  com¬ 
mon  condition,  proposition  A  implies  proposition  C.  This  is  called  Boolean  deduction  because 
it  is  just  ordinary  Boolean  deduction  when  applied  to  conditionals  with  the  same  condition,  and 
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a  conditional  can  only  imply  another  conditional  provided  they  have  equivalent  conditions. 

Using  conjunction  (A)  to  define  implication  yields  Conjunctive  Implication  (</^): 

(A|B)<a(C|D)  if&onlyif  (A|B)  A  (C|D)  =  (A|B)  (2.1.2) 

For  </\  the  conjunction  of  two  or  more  conditionals  always  implies  each  of  its  components. 

Using  disjunction  (V)  to  define  implication  yields  Disjunctive  Implication  (<v): 

(A|B)<v(C|D)  if&onlyif  (A|B)  V  (C|D)  =  (C|D)  (2.1.3) 

For  <v  the  disjunction  of  two  or  more  conditionals  is  always  implied  by  each  of  the  component 
conditionals. 

Applying  the  material  conditional  equation  “q  or  not  p  =  1”  to  conditionals  yields  what  is 
called  Probabilistically  Monotonic  Implication  (^pm)'. 

(A|B)  <p^  (C|D)  if  &  only  if  (C|D)  V  (A|B)'  =  (H  |  D  V  B)  (2.1.4) 

For  <pn,  any  conditional  (C|D)  implied  by  (A|B)  has  conditional  probability  no  less  than 
P(A|B).  Here,  the  universal  proposition  is  denoted  “1”  and  the  universal  event  is  fl 

In  [Cal9 1 ,  Cal02]  the  defining  equations  on  the  right  side  of  the  definitions  (2. 1 . 1  -  2. 1 .4)  have 
been  reduced  to  Boolean  deductive  relations  between  the  component  Boolean  propositions. 
For  instance,  2.1.2  reduces  to  the  two  Boolean  implications,  (A  V  B'  <  C  V  D')  and  (B'  ^  D’); 
2.1.3  reduces  to  (AB  <  CD)  and  (B  <  D);  and  2.1.4  reduces  to  (A  V  B'  <  C  V  D')  and  (AB  < 
CD).  Thus  between  two  conditionals  (A|B)  and  (C|D)  four  elementary  Boolean  deductive  rela¬ 
tions  arise:  B  <  D,  AB  <  CD,  A  V  B'  <  C  V  D'  and  B'  <  D’.  What  is  implied  by  these  implica¬ 
tion  relations  is  applicability,  truth,  non-falsity  and  inapplicability  respectively.  They  have 
been  denoted  <ap,  <tr,  -ip  respectively  where  “ap”  means  “applicable”,  “tr”  means 

“truth””,  “nf’  means  “non- falsity”  and  “ip”  means  “inapplicable”.  This  leads  to  a  hierarchy  of 
deductive  relations  on  conditionals  as  one,  two,  three  or  all  four  of  these  different  Boolean 
relations  are  assumed  necessary  for  a  deductive  relation  (A|B)  <x  (C|D)  to  hold  between  two 
conditionals  (A|B)  and  (C|D).  See  Figure  2.1.  Actually,  except  for  <bo  all  of  these  deductive 
relations  can  be  defined  in  terms  of  just  one  or  two  of  the  four  elementary  ones  because,  for 
instance,  the  combined  properties  of  and  are  equivalent  to  those  of  <„,().  Similarly,  the 
combined  properties  of  <n.  and  <jp  are  equivalent  to  those  of  <n,y\. 
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Figure  2.1  Hierarchy  of  Implications  (Deductive  Relations) 
for  Conditionals 


Trivial  Implications 
1  -  Implication  of  Identity  (<i) 

(#)<,(c|d)  iff(ajb)  =  (c|d) 

0  -  Universal  Implication 

(a|b)  <0  (c|d)  for  all  (a|b)  &  (c|d) 


Elementary  Implications 
tr  -  Implication  of  Truth 

(a|b)  (c|d)  iff  ab  <  cd 

nf  -  Implication  of  Non-Falsity  (<nf) 

(a|b)  <„f(c|d)  iff(aVb’)  <nf(cVd’) 
ap  -  Implication  of  Applicability  (^ap) 

(a|b)  <ap(c|d)  iff  b<d 
ip  -  Implication  of  Inapplicability  (<jp) 

(aib)  <ip(c|d)  ifFd<b 

Three  Elementaries  Combined 
mV  -  (Probabilistically)  Monotonic  and 

Applicability  Implication  (^^o) 
mA  -  (Probabilistically)  Monotonic  and 

Inapplicability  Implication  (5^0) 

Four  Elementaries  Combined 
bo  -  Boolean  Deduction  (oB  i  fixed  b) 

(ajb)  <5q  (c|d)  iff  b  =  d  and  ab  <  cd 


Two  Elementaries  Combined 
V  -  Disjunctive  Implication  (<v) 

(ajb)  <v  (cjd)  iff  b  <  d  and  ab  <  cd 
pm  -  Probabilistically  Monotonic 
Implication;  {<^^) 

(a|b)  <pj„  (c|d)  iff  ab  <  cd  and 

(aVb’)<nf(cVd’) 

A  -  Conjunctive  Implication  {<^) 

(a|b)  <yv  (c|d)  iff  d  <  b  and  (a  V  b’)  <nf  (c  V  d^ 
ec  -  Implication  of  Equal  Conditions  (<gc) 

(aib)  <ec  (cid)  iffb  =  d 
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2,2  Deductively  Closed  Sets  of  Conditionals.  Having  defined  the  idea  of  a  deductive  relation 
on  conditionals  it  is  now  possible  to  define  the  set  of  implications  of  a  set  of  conditionals  with 
respect  to  such  a  deductive  relation. 

Definition  2.2.1.  A  subset  i^of  conditionals  is  said  to  be  a  deductively  closed  set  (DCS)  with 
respect  to  a  deductive  relation  <x  if  and  only  if  i^has  both  of  the  following  properties: 

If  (A|B)  G  IH:  and  (C|D)  €  IH:  then  (A|B)  A  (C|D)  G  H  and 

If  (A|B)  G  and  (A|B)  <x  (C|D)  then  (C|D)  G  IHI 

A  set  of  conditionals  with  the  first  property  is  said  to  have  the  conjunction  property  and  a  set  of 
conditionals  satisfying  the  second  property  is  said  to  have  the  deduction  property. 

The  following  theorem  states  that  the  intersection  of  two  DCS’s  with  respect  to  two  different 
deductive  relations  is  a  DCS  with  respect  to  the  deductive  relation  formed  by  combining  the 
requirements  of  those  two  deductive  relations. 

Theorem  2.2.2.  Conjunction  Theorem  for  Deductively  Closed  Sets  with  respect  to  two 
Deductive  Relations.  If  is  a  deductively  closed  set  of  conditionals  with  respect  to  a  deduc¬ 
tive  relation  <x,  and  is  a  deductively  closed  set  of  conditionals  with  respect  to  a  deductive 
relation  <y,  then  the  intersection  n  is  a  DCS,  'vith  respect  to  the  combined  deduc¬ 
tive  relation  defined  by: 

(A|B)  (C|D)  if  and  only  if  (A|B)  <,  (C|D)  and  (A|B)  <y  (C|D). 

The  proof  is  very  straightforward  including  showing  that  <^y  is  a  deductive  relation.  How¬ 
ever,  in  general  not  all  DCS’s  with  respect  to  <xp,y  are  intersections  of  DCS’s  with  respect  to 
the  component  deductive  relations  <x  and  <y. 

Definition  2.2.3  Deductive  Implications  of  a  set  J  of  conditionals.  If  J  is  any  subset  of  con¬ 
ditionals,  will  denote  the  smallest  deductively  closed  subset  with  respect  to  that 
includes  J.  We  say  that  i^(J)  is  the  deductive  extension  of  J  with  respect  to  <x,  or  that  J  gener¬ 
ates  or  implies  i^(J)  with  respect  to  <^.  A  DCS  is  principal  if  it  is  generated  by  a  single  condi¬ 
tional. 

Definition  2.2.4.  Conjunction  Property  for  Deductive  relations.  A  deductive  relation 
has  the  conjunction  property  if  and  only  if 

(A|B)  (C|D)  and  (A|B)  <,  (E|F)  implies  (A|B)  (C|D)  a  (E|F). 

(Note:  this  is  different  from  the  conjunction  property  satisfied  by  a  set  of  conditionals.) 

Theorem  2.2.5.  Principal  Deductively  Closed  Sets.  With  respect  to  any  deductive  relation 
<x  having  the  conjunction  property  the  deductively  closed  set  generated  by  a  single  conditional 
(A|B)  is  the  set  of  conditionals  that  subsume  it  with  respect  to  the  deductive  relation.  That  is, 
i^{(A|B)}  =  {(Y|Z):  (AjB)  <,  (Y|Z)}.  i^{(A|B)}  will  be  denoted  by  i^(A|B). 

Proof  of  Theorem  2.2.5.  j^(A|B)  has  the  conjunction  property.  For  suppose  that  (C|D)  and 
(E|F)  are  in  tH;,(A\B).  So  (A|B)  (C|D)  and  (A|B)  (E|F).  Therefore  (A|B)  (ClD)(E|F),  by 

the  conjunction  property  of  <,j.  So  (C|D)(E|F)  G  :?4(A|B).  0{^{A^)  obviously  also  has  the 
deduction  property  by  the  transitivity  of  any  deductive  relation  <,j.  Therefore  i?]^(A|B)  is  a 
DCS  of  conditionals.  Clearly  any  DCS  containing  (A|B)  must  also  include  i?{^(A|B).  So 
.?4(A|B)  is  the  smallest  DCS  containing  (A|B). 
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Theorem  2.2.6.  The  four  elementary  deductive  relations  <np  and 

and  their  combinations,  have  the  conjunction  property  of  Definition  2.2.4. 


<,p  on  conditionals 


Proof  of  Theorem  2.2.6.  Suppose  that  (A|B)  <ap  (C|D)  and  (A|B)  <3p  (E|F).  So  B  <  D  and  B 

<  F.  So  B  <  (D  A  F)  <  (D  V  F).  Therefore  (A|B)  <3p  (C|D)  A  (E|F)  =  (CDF’  V  D’EF  V  CDEF  | 

D  V  F)  because  B  <  D  V  F.  Suppose  next  that  (A|B)  <„  (C|D)  and  (A|B)  <tj  (E|F).  So  AB  < 
CD  and  AB  <  EF.  Therefore  (A|B)  (C|D)  A  (E|F)  because  AB  <  (CD)  A  (EF)  <  (CDF’  V 

D’EF  V  CDEF)  A  (D  V  F).  Suppose  next  that  (A|B)  <„f  (C|D)  and  (A|B)  (E|F).  So  (A  V  B’) 

<  (C  V  D’)  and  (A  V  B’)  <  (E  V  F’).  Therefore  (A|B)  <„f  (C|D)  A  (E|F)  because  (A  V  B’)  <  (C 

V  D’)  A  (E  V  F’)  =  (CD  V  D’)  A  (EF  V  F’)  =  (CDEF  V  D’EF  V  CDF’)  V  D’F’,  which  is  just 
(CDF’  V  D’EF  V  CDEF)  V  (D  V  F)’.  Fourthly,  suppose  that  (A|B)  <ip  (C|D)  and  (A|B)  <ip 
(E|F).  So  B’  <  D’  and  B’  <  F’.  Therefore  (A|B)  <ip  (C|D)  A  (E|F)  because  B’  <  D’  A  F’  =  (D 

V  F)’.  Finally,  Suppose  that  (A|B)  (C|D)  and  (A|B)  (E|F)  where  x  and  y  are  in  {ap,  tr, 

nf,  ip}.  So  (A|B)  (C|D)  and  (A|B)  <y  (C|D)  and  (A|B)  (E|F)  and  (A|B)  <y  (E|F).  There¬ 
fore  (A|B)  (C|D)  and  (A|B)  (E|F)  and  so  (A|B)  <,,  (C|D)  A  (E|F).  Similarly  (A|B)  <y 

(C|D)  A  (E|F).  Therefore  (A|B)  (C|D)  A  (E|F). 


Corollary  2.2.7.  If  <.j  is  one  of  the  elementary  deductive  relations  <3p,  <tp  <^{,  and  <jp  or  a 
deductive  relation  combining  two  or  more  of  these,  then  the  DCS  generated  by  (A|B)  with 
respect  to  <,  is  .“^.(AlB)  =  {(Y|Z):  (A|B)  <,  (Y|Z)}. 


Proof  of  Corollary  2.2.7.  The  proof  follows  immediately  from  Theorems  2. 2. .5  and  2.2.6. 

These  results  allow  the  principal  DCS’s  with  respect  the  four  elementary  deductive  relations 
and  their  combinations  to  be  explicitly  expressed  in  terms  of  Boolean  relations.  See  fCal02] 
for  details.  For  instance,  54p(A|B)  =  {(Y|Z):  Y  any  event  or  proposition  and  Z  any  event  or 
propo.sition  with  B  <  Z]  =  {(Y  |  B  V  Z):  Y  and  Z  any  events  or  propositions).  For  the.  elemen¬ 
tary  deductive  relations  these  solutions  are 

.?4p(A|B)  =  {(Y  I  B  V  Z);  any  events  or  propositions  Y  and  Z  in  6B}  (2.2.1) 

!}{^(A\B)  =  {  (AB  V  Y  I  AB  V  Z):  any  events  or  propositions  Y  and  Z  in  <1B}  (2.2.2) 


^4f(A|B)  =  {  (AB  V  B’  V  Y  I  Z):  any  Y,  Z  in  (B) 


(2.2.3) 


J4(a|b)  =  { (Y  I  BZ);  any  Y,  Z  in  .-B }  (2.2.4) 

The  following  result  allows  the  principal  DCS’s  of  the  deductive  relations  formed  by  combin¬ 
ing  two  or  more  of  the  elementary  deductive  relations  to  be  expressed  as  an  intersection  of 
principal  DCS’s  of  the  elemientary  deductive  relations.  This  result  does  not  extend  to  DCS’s 
generated  by  a  set  of  conditionals. 

Theorem  2.2.8.  The  principal  DCS  j?4ny(-A^|E)  of  a  single  conditional  (A|B)  with  respect  to  a 
combination  deductive  relation  <j;^,y  is  the  intersection  of  the  DCS’s  with  respect  to  the  compo¬ 
nent  deductive  relations  and  <y.  That  is,  54p,y(A|B)  =  54(A|B)  n  j7^(A|B). 

Proof  of  Theorem  2.2.9.  ^ny(A|B)  =  {(C|D):  (A|B)  <,^y  (C|D)}  =  {(C|D):  (A|B)  <,  (C|D) 
and  (A|B)  <y  (C|D)}  =  54(A|B)  n  5^(A|B). 
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Using  the  formulas  for  the  principal  DCS’s  with  respect  to  the  elementary  deductive  relations, 
the  principal  DCS’s  with  respect  to  the  combined  deductive  relations  have  been  calculated  in 
[Cal02].  For  the  deductive  relations  mentioned  above,  the  principal  DCS’s  are; 

i7iC(AlB)  =  {  (AB  V  Y  I  B  V  Z);  any  Y,  Z  in  6B}  (2.2.5) 

5ip^(AlB)  =  {  (AB  V  B’  V  Y  I  AB  V  Z);  any  Y,  Z  in  (B)  (2.2.6) 

J^/,(A|B)  =  {  (AB  ■  V  Y  1  BZ):  any  Y,  Z  in  }  (2.2.7) 

Having  described  the  principal  DCS’s  of  the  elementary  deductive  relations  and  their  combina¬ 
tion  deductive  relations,  these  results  can  be  used  to  describe  the  DCS’s  of  a  set  of  conditionals 
with  respect  to  these  deductive  relations. 

For  Boolean  deduction,  the  implications  of  a  finite  set  of  propositions  or  events  is  simply  the 
implications  of  the  single  proposition  or  event  formed  by  conjoining  the  members  of  that  initial 
finite  set  of  conditionals.  One  of  the  counter-intuitive  features  of  deduction  with  a  set  condi¬ 
tionals  is  the  necessity  of  considering  the  deductive  implications  of  all  possible  conjunctions  of 
the  members  of  that  initial  set  of  conditionals. 


Definition  2.2.10.  Conjunctive  Closure  of  a  Set  of  Conditionals.  If  J  is  a  set  of  conditionals 
then  the  conjunctive  closure  C(J)  of  J  is  the  set  of  all  conjunctions  of  any  finite  subset  of  J. 


Theorem  2.2.11.  Deduction  Theorem.  For  all  the  elementary  deductive  relations  and 
their  combinations,  except  for  and  <y,  the  DCS  with  respect  to  of  a  set  J  of  condi¬ 
tionals  is  the  set  of  all  conditionals  implied  with  respect  to  by  some  member  of  the  conjunc¬ 
tive  closure  C(J)  of  J.  That  is, 

5/(J)  =  {(Y|Z):(A1B)<,(Y1ZJ,  (AlB)eC(Jj} 

For  a  proof  see  subsection  3.4.3  of  [Cal02]. 


Corollary  2.2.12.  Under  the  hypotheses  of  the  Deduction  Theorem,  it  follows  from  Theorem 
2.2.5.  (Principal  Deductively  Closed  Sets)  that 

tt4(J)  =  U 

(■A|B)  eC(J) 

That  is,  the  deductively  closed  set  with  respect  to  <x  generated  by  a  subset  J  of  conditionals  is 
the  set  of  all  conditionals  implied  with  respect  to  <,.  by  some  member  of  the  conjunctive  clo¬ 
sure  C(J)  of  J. 

For  most  deductive  relations  it  is  necessary  in  general  to  first  determine  the  conjunctive  clo¬ 
sure  C(J)  of  a  finite  set  of  conditionals  J  in  order  to  determine  the  DCS  ^(J)  of  J.  However  for 
the  non-falsity,  inapplicability  and  conjunctive  deductive  relations,  that  is  for  x  e  {nf,  ip.  A},  the 
DCS  of  J  is  54(J)  =  74(A|B),  where  (AjB)  is  the  single  conditional  formed  by  conjoining  all  the 
conditionals  in  J. 
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Corollary  2.2.13.  With  respect  to  the  three  deductive  relations  <nf,  <jp,  and  the  DC 
finite  set  of  conditionals  J  is  principal  and  is  generated  by  the  single  conditional  formed  by 
conjoining  all  the  conditionals  in  J. 


Proof  of  Corollary  2.2.13.  Let  x  e  {nf,  ip,  A},  and  suppose  (A|B)  is  the  conjunction  of  all  the 
conditionals  in  the  set  J  of  conditionals.  Then  because  with  respect  to  <x, 

(A|B)  (Y|Z)  for  all  (Y|Z)  in  C(J).  This  follows  from  the  fact,  which  is  easily  checked,  that 

for  these  deductive  relations  the  conjunction  of  two  conditionals  always  implies  each  of  the 
component  conditionals. 


2.3  Examples  of  Deduction  with  Uncertain  Conditionals.  In  [Cal02]  the  implications  of  the 
three  well  known  “penguin  postulates”  have  been  completely  described  with  respect  to  the  ele¬ 
mentary  deductive  relations  and  their  combinations.  In  this  section  two  more  examples  will  be 
given.  First  the  implications  of  the  set  J  of  the  two  conditionals  {(A|B),  (B|C)}  will  be  deter¬ 
mined.  Of  interest  is  the  conditional  (A|C),  which  is  easily  true  when  the  initial  two  condition¬ 
als  are  certainties,  but  may  be  false  when  one  or  the  other  is  uncertain.  We  are  often  interested 
in  chaining  deductions  and  inferences  in  this  way.  What  are  the  implications  and  inferences  to 
be  made  from  knowing  “A  given  B”  and  “B  given  C”,  allowing  for  the  lack  of  certainty  of 
these  conditionals? 

2.3.1  Transitivity  Example.  Consider  the  set  J  consisting  of  two  uncertain  conditionals  (A|B) 
and  (B|C).  Then  the  conjunctive  closure  C(J)  =  {(A|B),  (B|C),  (A|B)(B|C)}  =  {(A|B),  (B|C), 
(AB  I  B  V  C)}.  For  x  E  {nf,  ip.  A},  by  Theorem  2.2.8  on  principal  DCS’s,  the  DCS  generated  by 
J  is  =  -^^CAB  I  B  V  C).  So  using  equations  2.2.3,  2.2.4,  and  2.2.7, 

%(J)  =  {  (Y  I  (B  V  C)  Z  ):  any  Y,  Z  in  fR} 

=  {  (AB  V  B’C’  V  Y  I  Z):  any  Y,  Z  in  £B} 

=  {  (AB  V  Y  I  (B  V  C)  Z):  any  Y,  Z  in  4B } 

Notice  that  (A|C)  6  54f(J)  by  setting  Y  =  AB’  and  Z  =  C.  In  that  case  (AB  V  B’C’  V  Y  |  Z)  = 
(AB  V  AB’  V  B’C’  I  C)  =  (A  V  B’C’  |  C)  =  (A|C).  Thus  with  respect  to  the  non-falsity  deduc¬ 
tive  relation  <nf,  the  conditional  (A|C),  as  expected,  is  implied  by  (A|B)  and  (B|C).  When  (A|B) 
and  (B|C)  are  non-false  then  so  is  (A|C).  is  the  set  of  all  conditionals  whose  conclusion 

includes  the  truth  of  (A|B)  and  also  the  inapplicability  of  both  (A|B)  and  (B|C).  By  similar 
arguments  (A|C)  is  in  and  also  in 

For  the  elementary  deductive  relations  <x  or  some  combination  of  them  except  for  and  <v, 
by  Corollary  2.2.12  the  DCS  generated  by  J  is  :}{^(])  =  .?4(A|B)  u  u  i^(AB  |  B  V  C). 

Now  let  x  =  pm.  That  is,  consider  the  deductions  of  J  with  respect  to  the  probabilistically 
monotonic  deductive  relation  <pjn  Since  (AB  |  B  V  C)  <p^  (A|B),  therefore  ft^j^fAB  |  B  V  C)  D 
Thus,  U  74  JAB  |  B  V  C).  So  by  equation  2.2.6  =  (BC  v 

C’  V  Y  I  BC  V  Z);  any  Y,  Z  in  S}  u  (AB  v  B’C’  V  Z  |  AB  v  Z):  any  Y,  Z  in  6B}.  Note  that 
(A|C)  is  not  necessarily  a  member  of  T^JJ). 

Furthermore,  since  <pjn  is  probabilistically  monotonic,  all  the  conditionals  in  74m(B|C)  =  (BC 
V  C’  V  Y  I  BC  V  Z):  any  Y,  Z  in  (8}  have  conditional  probability  no  less  than  P(B|C),  and  all 
the  conditionals  in  T^mCAB  |  B  V  C)  =  {AB  V  B’C’  V  Z  |  AB  V  Z);  any  Y,  Z  in  £B}  have  condi¬ 
tional  probability  no  less  than  P(AB  |  B  V  C). 
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2.3.2  Absent-minded  Coffee  Drinieer  Example.  The  second  example  by  H.  Pospesel  [Pos7 1, 
p.27,  #78]  is  a  typical  inference  problem  called  the  “absent-minded  coffee  drinker”:  “Since 
my  spoon  is  dry  I  must  not  have  sugared  my  coffee,  because  the  spoon  would  be  wet  if  I  had 
stirred  the  coffee,  and  I  wouldn’t  have  stirred  it  unless  I  had  put  sugar  in  it.” 

This  is  not  a  valid  argument  in  the  2-valued  logic,  but  there  are  still  deductions  and  inferences 
to  be  drawn  from  these  conditional  premises.  Let  D  denote  “my  spoon  is  diy”;  let  G  denote  “I 
sugared  my  coffee”;  and  let  R  denote  “I  stirred  my  coffee”.  Translating  into  this  terminology 
the  set  of  premises  is  J  =  {D,  CD’|R),  (R’|G’)}.  Therefore  the  conjunctive  closure  C(J)  =  {D, 
(D’|R),  (R’|G’),  D(D’|R),  D(R’|G’),  (D’|R)(R’|G’),  D(D’|R)(R’|G’)}.  Using  the  operations  on 
conditionals  1. 1-1.4  C(J)  becomes  {D,  (D’|R),  (R’|G’),  DR’,  DG  V  DR’G’,  (D’RG  V  R’G’  |  R 
V  G’),  DR’}.  So  according  to  the  Corollary  2.2.12,  for  any  of  the  elementary  deductive  rela¬ 
tions  <,(  or  their  combinations,  except  for  and  <,^,  7^(J)  =  74(D)  u  74(D’|R)  u  74(R’|G’)  u 
74(DR’)  u  74(DG  V  DR’G’)  u  74(D’RG  V  R’G’  |  R  V  G’). 

Now  this  union  can  be  simplified  because  some  of  these  DCS’s  are  included  in  the  others.  For 
instance,  since  all  of  these  deductive  relations  satisfy  DR’  <  D,  therefore  74(DR’)  2  74(D). 
Similarly,  DR’  <  DG  V  DR’  =  D(G  V  R’)  =  D(G  V  R’G’)  =  DG  V  DR’G’.  So  74(DR’)  □ 
74(DG  V  DR’G’).  Thus,  74(J)  =  74(D’|R)  u  74(R’|G’)  u  74(DR’)  u  74(D’RG  V  R’G’|  RvG’). 

For  X  =  ip,  nf  or  A,  by  Corollary  2.2.13,  74(J)  =  74(D(D’|R)(R’|G’))  =  74(DR’).  Therefore 
74f(J)  =  74f(DR’)  =  {(DR’  V  Y I  Z):  any  Y,  Z  in  cB).  That  is,  the  implications  of  J  when  its  con¬ 
ditionals  are  regarded  as  non-false,  are  all  those  conditionals  with  any  condition  and  whose 
conclusion  includes  the  event  DR’,  that  “my  spoon  is  dry”  and  “I  did  not  stir  my  coffee”. 
Notice  that  G’,  “I  did  not  sugar  my  coffee”,  is  not  an  implication  of  J  with  respect  to  the  non¬ 
falsity  deductive  relation,  and  neither  is  it  a  valid  consequence  of  J  in  the  2-valued  Boolean 
logic.  In  the  2-valued  logic  the  implications  of  J  are  the  universally  conditioned  events  that 
include  DR’,  that  the  spoon  is  dry  and  my  coffee  is  not  stirred.  But  the  implications  with 
respect  to  the  “non-falsity”  deductive  relation  <nf  include  all  those  with  any  other  condition 
attached. 

Similarly,  by  Corollary  2.2.13,  74(J)  =  DR’)  =  ((DR’  V  Y  |  Z):  any  Y,  Z  in  =  74, <J). 
and  so  in  this  case  the  implications  with  respect  to  are  equal  to  the  implications  with  respect 
to  <„f. 

Turning  to  <pn,,  there  is  an  additional  simplification.  Since  DR’  <pn,  (D’|R),  therefore 
74JDR’)  2  74,,(D’|R).  So  7/pJJ)  =  74JR’|G’)  U  T^JDR’)  U  74,(D’RG  V  R’G’  |  R  V  G’). 

By  equation  2.2.6,  74JR’|G’)  =  {(R’G’  V  G  V  Y  |  R’G’  V  Z):  any  Y,  Z  in  55),  and  %^(DR')  = 
{(DR’  V  Y  I  DR’  V  Z):  any  Y,  Z  in  6B},  and  74,^(D’RG  V  R’G’  |  R  V  G’)  =  {(D’RG  V  R’G’  V 
R’G  V  Y  I  D’RG  V  R’G’  V  Z):  any  Y,  Z  in  4B}  =  {(D’RG  V  R’  V  Y  |  D’RG  V  R’G’  V  Z):  any  Y, 
Zin  <«}. 

So  the  set  of  implications  with  respect  to  <pm  of  J  =  {D,  (D’  |  R),  (R’|G’))  is  the  union  of  three 
sets  of  conditionals.  74m(R’|G’)  is  the  set  of  all  conditionals  whose  condition  includes  my  not 
stirring  nor  sugaring  my  coffee  and  whose  conclusion  includes  sugaring  my  coffee  or  not  sug¬ 
aring  nor  stirring  it.  74m(DR’)  is  the  set  of  all  conditionals  whose  condition  and  conclusion 
include  my  not  stirring  my  coffee  and  my  spoon  being  dry.  74m(D’RG  V  R’G’  |  R  V  G’)  is  all 
conditionals  whose  condition  includes  my  stirring  and  sugaring  my  coffee  or  not  sugaring  my 
coffee  and  whose  conclusion  includes  my  stirring  and  sugaring  my  coffee  and  wetting  my 
spoon  or  neither  stirring  nor  sugaring  my  coffee. 


3S 


y/orkshop  ’’Condiijonais,  In'ormatiori,  and  Infsrencs” 


AH  the  conditionals  in  i‘^pm(R.’|G’)  have  conditional  probability  no  less  than  P(R’|G’).  Those  in 
7/^n,(DR’)  have  conditional  probability  no  less  than  P(DR’),  and  those  conditionals  in 
:^^(D’RG  0  R’G’  I  R  0  G’)  have  conditional  probability  no  less  than  P(D’RG  V  R’G’  |  RVG’). 

If  the  spoon  is  observed  to  be  dry,  then  D=1 .  So  yi!^„(DR’)  =  {(R’  V  Y  |  R’  V  Z);  any  Y,  Z  in  cB} 
and  yPp„(D’RG  V  R’G’  ]  R  V  G’)  =  {(R’  V  Y  |  R’G’  V  Z):  any  Y,  Z  in  iB},  and  the  latter  set  of 
conditionals  includes  those  of  :7t^n,(DR’)  by  setting  Z  =  R’G  V  W,  where  W  is  any  proposition 
or  event  in  (B.  Furthermore,  the  set  lYp  JR’|G’)  =  {(R’G’  V  G  V  Y  |  R’G’  V  Z):  any  Y,  Z  in  iB} 
=  {(R’  V  G  V  Y  I  R’G’  V  Z):  any  Y,  Z  in  iB}  is  also  a  subset  of  {(R’  V  Y  |  R’G’  V  Z);  any  Y,  Z 
in  (B}  by  setting  Y  =  G  V  W,  where  W  is  any  proposition  or  event  in  uB. 

So  if  my  spoon  is  observed  to  be  dry  (D=l),  then  ~  {(R’VY  |  R’G’  V  Z):  any  Y,  Z  in  iB}. 
Thus  the  only  conditionals  implied  with  respect  to  <pn,  by  J  =  (D,  (D’|  R),  (R’|G’)}=  {1,  (0|R), 
(R’|G’)}  are  {(R’  V  Y  |  R’G’  V  Z);  any  Y,  Z  in  uB},  namely  those  whose  condition  includes  the 
non-stirring  and  non-sugaring  of  my  coffee  and  whose  conclusion  includes  the  non-stirring  of 
my  coffee. 

Finally  consider  the  implications  with  respect  to  the  deductive  relation  <bo.  From  ~ 
y4(D’|R)  E  :Y,(R’|G’)  E  174(DR’)  E  174(D’RG  V  R’G’  |  R  V  G’)  it  follows  that  y4„(J)  = 
y4o(D’|R)  E  y4o(R’|G’)  E  :Ybo(DR’)  E  ^Y^D’RG  V  R’G’  |  R  V  G’)  =  {(D’R  V  Y  I  R);  and  Y  in 
51}  E  {(R’G’  V  Y  I  G’);  any  Y  in  iB}  E  {(DR’  V  Y):  any  Y  in  iB}  E  {(D’RG*  V  R’G’  V  Y  |  R 
VG’):  any  Yin  ^B}. 

2.3.3.  Absent-minded  Coffee  Drinker  Revisited.  It  interesting  to  see  what  happens  with  this 
example  when  the  conditional  (Px.’|G’)  in  J  is  replaced  by  (R|G).  Instead  of  saying  “I  wouldn’t 
have  stirred  my  coffee  unless  I  had  put  sugar  in  it”  suppose  it  was  “if  I  sugared  coffee  then 
I  stirred  it.”  Thus  J  =  {D,  (D’|  R),  (R|G)}. 

In  the  Boolean  2-valued  logic,  the  implications  of  J  are  those  of  the  conjunction  D(D’|R)(R1G) 
where  the  conditionals  are  equated  to  their  material  conditionals  and  have  a  conjunction  D(D’ 
VR’XRVG’)  =  DR’G’. 

More  generally  the  conjunctive  closure  of  J  is  C(J)  =  {D,  (D’|R),  (R|G),  D(D’|R),  D(R|G), 
(D’|R)(R|G),  D(D’|R)(R|G)}  -  {D,  (D’|R),  (R|G),  DR’,  DG’  V  DRG,  (D’RG’  V  D’RG  |  R  V  G), 
DR’G’}.  Obviously,  the  propositions  D  and  DR’  are  implications  with  respect  to  all  deductive 
relations  of  DR’G’,  and  so  for  all  deductive  relations  their  implications  are  included  in 
=  y4(D’|R)  E  :YXR|G)  E  :YXDG’  V  DRG)  E  :X;(D’R  |  R  V  G)  E  yfXDR’G’).  Furthermore, 
(DG’  V  DRG)  =  D(G’V  RG)  =  D(G’  V  R)  =  D(R’G’  V  R)  =  DR’G’  V  R)  is  also  an  implication 
of  DR’G’.  So  dropping  y4(DG’  V  DRG)  from  the  union,  =  y4(D’|R)  E  .74(R|G)  E 
yfXD’R  I  R  V  G)  E  y4(DR’G’). 

Note  that  the  proposition  DR’G’  (having  a  dry  spoon,  unstirred  coffee,  and  unsugared  coffee) 
which  is  the  conjunction  of  the  three  original  conditionals  of  J  =  {D,  (D’  |  R),  (R|G)},  is  an 
implication  with  respect  to  all  these  deductive  relations.  It  is  a  logical  consequence  of  J.  Fur¬ 
thermore,  by  rearranging  the  conditioning,  its  probability  P(DR’G’)  =  P(D)P(R’G’|D)  = 
P(D)P((G’|R’)  I  D)P(R’|D)  =  P(D)P(G’|DR’)P(R’|D).  This  latter  product  has  easily  estimated 
conditionals  probabilities.  P(D)  =  I  by  observation,  and  both  P(G’|DR’)  and  P(R’|D)  are  also 
close  to  or  equal  to  1 .  This  is  one  way  the  reasoning  can  proceed  even  though  the  initial  phras¬ 
ing  was  in  terms  of  conditionals  whose  probabilities  are  not  so  easily  estimated. 


40 


Workshop  ’’Conditionals,  Information,  and  inferanca” 


In  addition,  (D’R  |  R  V  G)  (D’|R)  because  (D’R)(R  V  G)  <  (D’R)  and  D’R  V  R^G’  <  D’R 

V  R’.  Thus  JJ)  =  :7fp^(R|G)  E  :Wpn,(D’R  |  R  V  G)  E  :w"p^(DR’G’).  So  :W-p,(J)  =  {(RG  V  G’  V 

Y  I  RG  V  Z):  any  Y,  Z  in  6B}  E  {(D’R  V  R’G’  V  Y  |  D’R  V  Z):  any  Y,  Z  in  £B}  E  {(DR’G’  V  Y 
I  DR’G’  V  Z);  any  Y,  Z  in  68}.  Furthermore,  the  conditionals  in  yf^jp(R|G)  all  have  conditional 
probability  no  less  than  P(R|G),  and  similarly  for  the  conditionals  in  yf^^iCD’R  |  R  V  G)  and  in 
:WpJDR’G’). 

Turning  to  the  non-falsity  deductive  relation,  because  DR’G’  <  D’R  V  R’G’  therefore  DR’G’ 
<nf  (D’R  I  R  V  G),  and  so  y^f(D’R  |  R  V  G)  C  y/^f(DR’G’).  Furthermore,  because  DR’G’  <  R  V 
R’G’  =  R  V  G’  therefore  DR’G’  <nf  (R|G)  and  so  y4f(R|G)  Q  .?(^f(DR’G’).  So  the  implications 
of  J  with  respect  to  <nf  are  =  {(DR’G’  V  Y  |  Z):  any  Y,  Z  in  cB},  namely  any  conditionals 
whose  conclusion  includes  DR’G’. 

Finally,  with  respect  to  <bo  from  y4(J)  "-^CD’IR)  L  .74(R|G)  I  R  V  G)  E 

it  follows  that  y4o(J)  =  ^bo(D’IR)  E  :w"bo(R|G)  E  :74,(D’R  I  R  V  G)  E  y4o(DR’G’)  =  {(D’R  V  Y 
I  R);  any  Y  in  68}  E  {(RG  V  Y  |  G):  any  Y  in  68}  E  {(D’R  V  Y  |  R  V  G):  any  Y  in  68}  E 
{(DR’G’  V  Y):  any  Y  in  68}.  So  the  implications  with  respect  to  <i,o  include  y4o(D’|R)» 
those  conditionals  with  the  condition  that  I  stirred  my  cotfee  (R)  and  with  a  conclusion  that 
includes  a  non-dry  spoon  and  stirred  coffee  (D’R).  y4o(R|G)  is  all  conditionals  with  sugared 
coffee  (G)  as  condition  and  with  a  conclusion  that  includes  RG,  stirred  and  sugared  coffee. 
y4o(D’R  I  R  V  G)  is  all  conditionals  with  conclusions  that  include  D’R  and  with  condition  R  V 
G,  of  either  stirred  coffee  or  sugared  coffee.  y4o(DR’G’)  is  simply  the  set  of  all  (universally 
unconditioned)  events  that  include  DR’G’,  a  dry  spoon  and  unstirred,  unsugared  coffee.  All  of 
these  conditionals  have  probabilities  no  less  than  the  corresponding  conditional  that  generates 
them. 


3.  Computations  with  Conditionals.  V/liile  the  preceding  sections  provide  an  adequate  theo¬ 
retical  basis  for  calculating  and  reasoning  with  conditional  prepositions  or  conditional  events, 
the  problem  of  the  complexity  of  information  is  no  less  daunting.  Indeed,  even  without  the 
added  computational  burden  of  operating  with  explicit  conditionals,  just  operating  with  Bool¬ 
ean  expressions  in  practical  situations  with,  say,  a  dozen  variables,  is  already  too  complex  for 
practical  pure  Bayesian  analysis.  The  reason  for  this  is  that  in  most  situations  the  available 
information  is  insufficient  to  determine  a  single  probability  distribution  that  satisfies  the 
known  constraints  of  the  situation.  Various  possibilities  concerning  unknown  dependences 
between  subsets  of  variables  result  in  complicated  solutions  to  relatively  simple  problems. 


3.1  Pure  Bayesian  Analysis.  For  example,  consider  again  the  transitivity  problem  of  Section 
2.3.1.  If  “A  given  B”  and  “B  given  C”  are  both  certain,  then  it  follows  that  “A  given  C”  is  also 
a  certainty.  But  if  they  are  not  certain,  then  by  pure  Bayesian  analysis,  P(A|C)  can  be  zero  no 
matter  how  high  are  the  conditional  probabilities  of  (A|B)  and  (B|C).  This  happens  because 
P(B|C)  and  P(A|B)  can  be  almost  1  while  P(A|  BC)  is  zero,  and  it  is  the  latter  probability  that 
appears  in  the  Bayesian  solution:  P(A|C)  =  P(AB  or  AB’  |  C)  =  P(AB|C)  +  P(AB’|C)  = 
P(ABC)/P(C)  +  P(AB’C)/P(C)  =  P(ABC  |  BC)  P(BC|C)  +  P(AB’C  |  B’C)  P(B’C|C)  = 
P(A|BC)P(B|C)  +  P(A|B’C)P(B’|C).  Without  knowing  anything  about  P(A|BC)  or  P(A|B’C), 
nothing  more  can  be  said  about  P(A|C). 


yyork-shop  ’’Conciitionais,  information,  and  infcfcnce 


4i 


3.2  Choosing  a  Bayesian  Solution  Using  Maximum  InfcrmiUtion  Entropy.  Continuing  the 
example  of  Section  3.1,  knowing  that  C  is  true  might  dramatically  change  P(AIB)  up  or  down. 
But  if  nothing  is  known  one  way  or  the  other,  the  choice  of  the  maximum  information  entropy 
distribution  assumes  that  P(A|BC)  =  P(A|B).  This  latter  equation  is  called  the  conditional 
independence  of  A  and  C  given  B.  It  can  also  be  expressed  as  P(AC|B)  =  P(A|B)P(C|B)  or  as 
P(C|AB)  =  P(C|B).  Using  this  principle  P(A|C)  =  P(A|B)P(B|C)  +  P(A|B’C)P(B’|C).  So  if 
P(AIB)  and  P(B|C)  are  0.9  and  0.8  respectively,  then  P(A|C)  is  at  least  0.72.  Additionally, 
since  nothing  is  known  one  way  or  the  other  about  the  occurrence  of  A  when  B  is  false  and  C  is 
true,  this  principle  of  “maximum  indifference”  implies  that  P(A|B’C)  should  be  taken  to  be  'A. 
So  the  term  P(A|B’C)P(B’|C)  contributes  (1/2)P(B’|C)  =  (1/2)(1  -  0.8)  =  0.1  to  P(A|C)  bring¬ 
ing  the  total  to  0.82. 


In  affect  the  principle  of  maximum  information  entropy  chooses  that  probability  distribution  P 
that  assumes  conditional  independence  of  any  two  variables  that  are  not  explicitly  known  to 
have  some  dependence  under  the  condition.  This  greatly  simplifies  computations  and  often 
allows  situations  of  several  dozen  variables  to  be  rapidly  analyzed  as  long  as  the  clusters  of 
dependent  variables  are  not  too  large  and  not  too  numerous.  The  maximum  entropy  solution  is 
always  one  of  the  possible  Bayesian  solutions  of  the  situation.  If  there  is  just  one  Bayesian 
solution,  then  the  two  solutions  will  always  agree. 


It  is  a  remarkable  fact  that  such  a  function  as  the  entropy  function  exists,  and  it  is  now  clear 
that  it  has  wide  application  to  information  processing  under  uncertainty.  If  the  n  outcomes  of 
some  experiment  are  to  be  assigned  probabilities  pj  for  i=l  to  n  subject  to  some  set  of  con¬ 
straints,  then  the  distribution  of  probabilities  that  assumes  conditional  independence  unless 
dependence  is  explicitly  known  is  the  one  that  maximizes  the  entropy  function 


n 

H(Pi,  P2,  P3.  •••.  Pn)  =  -  ^  Pi  log  Pi 

i=-l 


and  also  satisfies  the  known  constraints.  If  there  is  an  a  priori  distribution  q^,  q2,  q-:;,  ...,  q,, 
then  II  is  given  by 


n 

1^(P  1 5  P2’  P3’  ■  ■  ■  ’  Pn’  9 1 5  92’  93,  . . .,  9n)  ^  Pi  log  (Pi^9i) 

i=l 

This  allows  maximum  entropy  updates  when  additional  information  is  available.  See  J.  E. 
Shore  [ShoSO]  for  a  derivation. 


W.  Rodder  [R6d96,  RodOO]  and  his  colleagues  at  Fern  University  in  Hagen  are  continuing  to 
develop  a  very  impressive  interactive  computer  program  SPIRIT  that  implements  this  practical 
approach  to  the  computation  of  propositions  and  conditional  propositions  and  their  probabili¬ 
ties.  Starting  with  an  initially  defined  set  of  variables  and  their  values,  the  user  can  input  state¬ 
ments  and  conditionals  statements  about  these  variables  taking  various  values,  and  can  also 
assign  conditional  probabilities  to  them.  The  utility  of  having  a  variable  take  one  of  its  values 
can  also  be  incorporated. 
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3.3  Confidence  in  Maximum  Entropy  Solutions.  While  the  maximum  entropy  solution  pro¬ 
vides  the  most  plausible  or  “most  likely”  probability  distribution  for  a  situation  among  all  of 
the  Bayesian  solutions,  it  does  not  immediately  provide  a  means  for  estimating  how  much  con¬ 
fidence  to  attach  to  that  solution.  This  issue  has  been  taken  up  by  E.  Jaynes  [Jay79],  S.  Amari 
[Ama85]],  and  A.  Caticha  [CatOO], 

Jaynes  puts  the  matter  as  follows:  “Granted  that  the  distribution  of  maximum  entropy  has  a 
favored  status,  in  exactly  what  sense,  and  how  strongly,  are  alternative  distributions  of  lower 
entropy  ruled  out?”  He  proves  an  entropy  “concentration  theorem”  in  the  context  of  an  gener¬ 
alized  experiment  of  N  independent  trials  each  having  n  possible  results  and  satisfying  a  set  of 
m  (<  n)  linearly  independent,  linear  constraints  on  the  observed  frequencies  of  the  experiment. 
Jaynes  shows  that  in  the  limit  as  the  number  N  of  trials  approaches  infinity,  the  fraction  F  of 
probability  distributions  satisfying  the  m  constraints  and  whose  entropy  H  differs  from  the 

maximum  by  no  more  than  DH  is  given  by  the  Chi-square  distribution  X]."  with  k  =  n  -  m  -1 
degrees  of  freedom  as 

2N(DH)  =  KkkF) 

That  is,  the  critical,  threshold  entropy  value  Hg  for  which  only  the  fraction  a  of  the  probability 
distributions  that  satisfy  the  m  constraints  have  smaller  entropy  is  given  by 

H„  =H„._,-Xk2(l-Q)/2N. 

For  N  =  1000  independent  trials  of  tossing  a  6-sided  die  and  with  a  significance  level  a  =  0.05 
and  degrees  of  freedom  k  =  6  -  1  -  1  =  4,  95%  of  the  eligible  probability  distributions  have 
entropy  no  less  than  '  5-49  /  2N  =  '  0.0047.  is  on  the  order  of  1 .7918  for 

a  fair  die  and  1.6136  for  a  die  with  average  die  value  of  4.5  instead  of  3.5.  Letting  n  =  0.005  it 
follows  that  99.5%  of  the  eligible  distributions  will  have  entropy  no  less  than  "  '^^9  / 
2000  =  H^ax- 0-00745. 

Clearly  eligible  distributions  that  significantly  deviate  in  entropy  from  the  maximum  value  are 
very  rare.  However  this  result  does  not  directly  answer  the  question  of  how  much  confidence 
to  have  in  the  individual  probabilities  associated  with  distributions  having  maximum  or  almost 
maximum  entropy.  That  is,  can  a  probability  distribution  with  close  to  maximum  entropy 
assign  probabilities  that  are  significantly  different  from  the  probabilities  of  the  maximum 
entropy  distribution? 

For  instance,  a  6-sided  die  having  two  faces  with  probabilities  1/12  and  1/4  respectively  and 
four  faces  each  having  1/6  probability  has  entropy  0.0436  less  than  the  maximum  of  1 .7918  for 
a  fair  die.  So  for  N=1000  independent  trials  and  a  significance  level  of  a  =  0.05  such  a  distri¬ 
bution  would  differ  from  the  maximum  entropy  value  for  a  fair  die  by  considerably  more  than 
0.0047.  However  for  N=100,  DH  =  9.49/200  =  0.047,  which  is  large  enough  to  include  such  a 
distribution. 

Furthermore,  how  does  the  confidence  in  the  probabilities  determined  by  a  maximum  entropy 
solution  depend  upon  the  amount  of  under-specification  of  the  situation  that  produced  that 
solution?  Surely  a  maximum  entropy  distribution  that  relies  upon  a  great  deal  of  ignorance 
about  a  situation  offers  less  confidence  about  the  probabilities  determined  than  does  a  maxi¬ 
mum  entropy  solution  that  is  based  upon  a  minimum  of  ignorance  about  the  situation.  Put 
another  way,  confidence  about  the  maximum  entropy  distribution  should  be  higher  when  con¬ 
ditional  independencies  are  positively  known  than  when  they  are  merely  provisionally 
assumed. 
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Amari  [.Ama85]  takes  up  these  issues  in  the  context  of  differential  geometry.  Under-specifica  ■ 
tion  of  information  gives  rise  to  a  manifold  of  possible  probability  distributions.  A  Rieman- 
nian  metric  on  these  distributions  early  introduced  by  C.  R.  Rao  [Rao45]  allows  a  very  general 
approach  to  quantifying  the  distance  between  distributions.  This  development  provides  a  very 
general  approach  to  these  problems  of  multiple  possible  distributions,  but  so  far  the  results 
don’t  seem  to  directly  apply  to  the  issue  of  the  confidence  to  be  attached  to  the  individual  prob¬ 
abilities  dictated  by  a  maximum  entropy  distribution.  Unfortunately  Amari  offers  no  numeri¬ 
cal  example  to  illustrate  how  these  results  might  be  applied  to  allow  a  confidence  measure  to 
be  put  upon  the  probabilities  associated  with  distributions  having  maximum  or  close  to  maxi¬ 
mum  entropy. 

Caticha  [CatOO]  frames  the  question  along  the  same  lines  as  Jaynes;  “Once  one  accepts  that  the 
maximum  entropy  distribution  is  to  be  preferred  over  all  others,  the  question  is  to  what  extent 
are  distributions  with  lower  entropy  supposed  to  be  ruled  out?”  Using  a  parameterized  family 
of  distributions  Caticha  shows  how  this  question  can  be  rephrased  as  another  maximum 
entropy  problem,  but  he  too  offers  no  simple  illustrative  example  of  how  his  results  can  be 
applied  to  the  question  of  how  much  confidence  to  have  in  any  one  probability  value  associ¬ 
ated  with  the  maximum  entropy  distribution. 

What  seems  to  be  needed  is  a  way  to  solve  for  the  probabilities  of  specified  outcomes  in  terms 
of  entropies  equal  to  or  close  to  the  maximum  entropy.  If  95%  of  the  eligible  probability  distri¬ 
butions  have  entropy  H  no  less  than  ‘  then  what  confidence  limits  are  implied  for  the 
individual  probabilities  of  those  distributions? 

4.  Summary,  in  order  to  adequately  represent  and  manipulate  explicitly  conditional  state¬ 
ments  such  as  “A  given  B”  the  familiar  Boolean  algebra  of  propositions  or  events  must  be 
extended  to  ordered  pairs  of  such  propositions  or  events.  This  is  quite  analogous  to  the  require¬ 
ment  to  extend  integers  to  order  pairs  in  order  to  adequately  represent  fractions  and  allow  divi¬ 
sion.  The  resulting  system  of  Boolean  fractions  includes  the  original  propositions  and  also 
allows  the  non-trivial  assignment  of  conditional  probabilities  to  these  Boolean  fractions.  Bool¬ 
ean  fractions  are  truth  functional  in  the  sense  that  their  truth  status  is  completely  determined  by 
the  tnith  or  falsity  of  the  two  Boolean  components  of  the  fraction.  But  since  there  are  two 
components,  the  truth  status  of  a  Boolean  fraction  has  three  possibilities  -  one  when  the  condi¬ 
tion  (denominator)  is  false  and  two  more  when  the  denominator  is  true.  Just  as  all  integer  frac¬ 
tions  with  a  zero  denominator  are  “undefined”,  so  too  are  all  Boolean  fractions  with  a  false 
condition  undefined  or  “inapplicable”.  When  the  condition  is  true  then  the  truth  status  of  a 
Boolean  fraction  is  determined  by  the  truth  of  the  numerator.  The  four  extended  operations 
(or,  and,  not,  and  given)  on  the  Boolean  fractions  reduce  to  ordinary  Boolean  operations  when 
the  denominators  are  equivalent.  Just  as  with  integer  fractions,  the  system  of  Boolean  fractions 
has  some  new  properties  but  loses  others  that  are  true  in  the  Boolean  algebra  of  propositions  or 
events. 

A  conditional  statement  is  not  an  implication  or  a  deduction;  it  is  rather  a  statement  in  a  given 
context.  Deduction  of  one  conditional  by  another  can  still  be  defined  in  terms  of  the 
(extended)  operations,  as  is  often  done  in  Boolean  algebra.  Due  to  the  two  components  of  a 
conditional  there  is  a  question  of  what  is  being  implied  when  one  conditional  implies  another. 
It  turns  out  that  several  plausible  implications  between  conditionals  can  be  reduced  to  ordinary 
implications  between  the  Boolean  components  of  the  two  conditionals.  The  applicability, 
truth,  non-falsity  or  inapplicability  of  one  conditional  can  imply  the  corresponding  property  in 
the  second  conditional.  Any  two  or  more  of  these  four  elementary  implications  can  be  com- 


Workshop  ’’Conditionals,  Information,  and  Inferoncs” 


4*^ 

-r 


bined  to  form  a  more  stringent  im.plicatinn.  With  respect  to  any  one  of  these  implications,  a  set 
of  conditionals  will  generally  imply  a  larger  set,  and  it  is  now  possible  to  compute  the  set  of  all 
deductions  generated  by  some  initial  set  of  conditionals,  as  illustrated  by  three  examples  in  this 
paper. 

While  computations  can  be  done  in  principle,  in  practice  the  complexity  of  partial  and  uncer¬ 
tain  conditional  information  precludes  the  possibility  of  solving  for  all  possible  probability  dis¬ 
tributions  that  satisfy  the  partial  constraints.  What  is  feasible  and  already  successfully 
implemented  in  the  program  SPIRIT  is  to  compute  the  distribution  with  maximum  information 
entropy.  However,  the  amount  of  confidence  that  can  be  associated  with  the  probabilities 
assigned  by  this  “most  likely”,  maximum  entropy  distribution  is  still  an  open  question. 
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